From Fundamentals to Advanced Deep Learning (With Theory)
PyTorch is an open-source deep learning framework developed by Meta (Facebook). It is widely used in research and industry due to its flexibility and ease of debugging.
Key Characteristics:
PyTorch can be installed using pip. It automatically detects CPU or GPU support.
pip install torch torchvision torchaudio
import torch
print(torch.__version__)
A tensor is a multi-dimensional array, similar to NumPy arrays, but with GPU acceleration and automatic differentiation support.
x = torch.tensor([1, 2, 3])
y = torch.rand(2, 3)
z = torch.zeros(3, 3)
Tensors support mathematical operations and broadcasting.
Autograd is PyTorch’s automatic differentiation engine. It tracks operations on tensors and computes gradients during backpropagation.
x = torch.tensor(2.0, requires_grad=True)
y = x ** 3
y.backward()
print(x.grad)
This mechanism forms the backbone of neural network training.
All neural networks in PyTorch are created by subclassing
torch.nn.Module.
import torch.nn as nn
class SimpleNet(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(10, 1)
def forward(self, x):
return self.fc(x)
The forward() method defines the computation flow.
Loss functions measure model error, while optimizers update model weights.
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
Common losses include MSE, CrossEntropy, and BCE.
The training loop performs forward pass, loss computation, backpropagation, and parameter updates.
for epoch in range(100):
optimizer.zero_grad()
outputs = model(x_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()
DataLoaders handle batching, shuffling, and efficient data loading.
from torch.utils.data import DataLoader, TensorDataset
dataset = TensorDataset(x_train, y_train)
loader = DataLoader(dataset, batch_size=32, shuffle=True)
During evaluation, training-specific layers like Dropout must be disabled.
model.eval()
with torch.no_grad():
predictions = model(x_test)
Regularization prevents overfitting and improves generalization.
nn.Dropout(p=0.5)
optimizer = torch.optim.Adam(
model.parameters(),
lr=0.001,
weight_decay=1e-4
)
Proper initialization prevents vanishing and exploding gradients.
nn.init.kaiming_normal_(model.fc.weight)
Schedulers dynamically adjust learning rate to improve convergence.
scheduler = torch.optim.lr_scheduler.StepLR(
optimizer, step_size=10, gamma=0.1
)
scheduler.step()
CNNs are designed for image data using convolutional filters.
class CNN(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(1, 32, 3)
self.fc = nn.Linear(32 * 26 * 26, 10)
def forward(self, x):
x = torch.relu(self.conv(x))
x = x.view(x.size(0), -1)
return self.fc(x)
Transfer learning reuses pre-trained models to reduce training time and data requirements.
from torchvision import models
model = models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
Gradient clipping prevents exploding gradients in deep networks.
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
Mixed precision improves training speed and reduces memory usage.
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
Saving models allows reuse and deployment.
torch.save(model.state_dict(), "model.pth")
model.load_state_dict(torch.load("model.pth"))